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Abstract 

While loopy belief propagation (LBP) performs reasonably well for inference in some Gaussian 
graphical models with cycles, its performance is unsatisfactory for many others. In particular for some 
models LBP does not converge, and in general when it does converge, the computed variances are 
incorrect (except for cycle-free graphs for which belief propagation (BP) is non-iterative and exact). In 
this paper we propose feedback message passing (FMP), a message-passing algorithm that makes use 
of a special set of vertices (called a feedback vertex set or FVS) whose removal results in a cycle-free 
graph. In FMP, standard BP is employed several times on the cycle-free subgraph excluding the FVS 
while a special message-passing scheme is used for the nodes in the FVS. The computational complexity 
of exact inference is 0{k^n), where k is the number of feedback nodes, and n is the total number of 
nodes. When the size of the FVS is very large, FMP is intractable. Hence we propose approximate 
FMP, where a pseudo-FVS is used instead of an FVS, and where inference in the non-cycle-free graph 
obtained by removing the pseudo-FVS is carried out approximately using LBP. We show that, when 
approximate FMP converges, it yields exact means and variances on the pseudo-FVS and exact means 
throughout the remainder of the graph. We also provide theoretical results on the convergence and 
accuracy of approximate FMP. In particular, we prove error bounds on variance computation. Based 
on these theoretical results, we design efficient algorithms to select a pseudo-FVS of bounded size. The 
choice of the pseudo-FVS allows us to explicitly trade off between efficiency and accuracy. Experimental 
results show that using a pseudo-FVS of size no larger than log(n), this procedure converges much more 
often, more quickly, and provides more accurate results than LBP on the entire graph. 
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I. Introduction 

Gaussian graphical models are used to represent the conditional independence relationships among a 
collection of normally distributed random variables. They are widely used in many fields such as computer 
vision and image processing ||2], gene regulatory networks [3], medical diagnostics lH, oceanography 
||5l, and communication systems 161. Inference in Gaussian graphical models refers to the problem of 
estimating the means and variances of all random variables given the model parameters in information 
form (see Section ITl-AI for more details). Exact inference in Gaussian graphical models can be solved by 
direct matrix inversion for problems of moderate sizes. However, direct matrix inversion is intractable 
for very large problems involving millions of random variables, especially if variances are sought lH], 
Q, m. The development of efficient algorithms for solving such large-scale inference problems is thus 
of great practical importance. 

Belief propagation (BP) is an efficient message-passing algorithm that gives exact inference results 
in linear time for tree-structured graphs |^ |. The Kalman filter for linear Gaussian estimation and the 
forward-backward algorithm for hidden Markov models can be viewed as special instances of BP. Though 
widely used, tree-structured models (also known as cycle-free graphical models) possess limited modeling 
capabilities, and many stochastic processes and random fields arising in real-world applications cannot 
be well-modeled using cycle-free graphs. 

Loopy belief propagation (LBP) is an application of BP on loopy graphs using the same local message 
update rules. Empirically, it has been observed that LBP performs reasonably well for certain graphs 
with cycles lITOl . iflTI . Indeed, the decoding method employed for turbo codes has also been shown to 
be a successful instance of LBP |[T2ll . A desirable property of LBP is its distributed nature - as in BP, 
message updates in LBP only involve local model parameters or local messages, so all nodes can update 
their messages in parallel. 

However, the convergence and correctness of LBP are not guaranteed in general, and many researchers 
have attempted to study the performance of LBP |[T3]| - |[T6]| . For Gaussian graphical models, even if LBP 

1 
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converges, it is known that only the means converge to the correct values while the variances obtained are 
incorrect in general llT4l . In fl6\. a walk-sum analysis framework is proposed to analyze the performance 
of LBP in Gaussian graphical models. Based on such a walk-sum analysis, other algorithms have been 
proposed to obtain better inference results IITTI . 

LBP has fundamental limitations when applied to graphs with cycles: Local information cannot capture 
the global structure of cycles, and thus can lead to convergence problems and inference errors. There 
are several questions that arise naturally: Can we use more memory to track the paths of messages? Are 
there some nodes that are more important than other nodes in terms of reducing inference errors? Can 
we design an algorithm accordingly without losing too much decentralization? 

Motivated by these questions, we consider a particular set of "important" nodes called a feedback 
vertex set (FVS). A feedback vertex set is a subset of vertices whose removal breaks all the cycles in 
a graph. In our feedback message passing (FMP) algorithm, nodes in the FVS use a different message 
passing scheme than other nodes. More specifically, the algorithm we develop consists of several stages. 
In the first stage on the cycle-free graph (i.e., that excluding the FVS) we employ standard inference 
algorithms such as BP but in a non-standard manner: Incorrect estimates for the nodes in the cycle-free 
portion are computed while other quantities are calculated and then fed back to the FVS. In the second 
stage, nodes in FVS use these quantities to perform exact mean and variance computations in the FVS 
and to produce quantities used to initiate the third stage of BP processing on the cycle-free portion in 
order to correct the means and variances. If the number of feedback nodes is bounded, the means and 
variances can be obtained exactly in linear time by using FMP. In general, the complexity is 0{k'^n), 
where k is the number of the feedback nodes and n is the total number of nodes. 

For graphs with large feedback vertex sets (e.g., for large two-dimensional grids), FMP becomes 
intractable. We develop approximate FMP using a pseudo-FVS (i.e., a set of nodes of moderate size 
that break some but not all of the cycles). The resulting algorithm has the same structure as the exact 
algorithm except that the inference algorithm on the remainder of the graph, (excluding the pseudo- 
FVS), which contains cycles, needs to be specified. In this paper we simply use LBP, although any other 
inference algorithm could also be used. As we will show, assuming convergence of LBP on the remaining 
graph, the resulting algorithm always yields the correct means and variances on the pseudo-FVS, and the 
correct means elsewhere. Using these results and ideas motivated by the work on walk-summability (WS) 
|[T6l . we develop simple rules for selecting nodes for the pseudo-FVS in order to ensure and enhance 
convergence of LBP in the remaining graph (by ensuring WS in the remaining graph) and high accuracy 
(by ensuring that our algorithm "collects the most significant walks"; see Section Ill-Cl for more details). 
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This pseudo-FVS selection algorithm allows us to trade off efficiency and accuracy in a simple and 
natural manner. Experimental results suggest that this algorithm performs exceedingly well - including 
for non-WS models for which LBP on the entire graph fails catastrophically - using a pseudo-FVS of 
size no larger than log(n). 

Inference algorithms based on dividing the nodes of a graphical model into subsets have been explored 
previously lITSl . |[T9l . The approach presented in this paper is distinguished by the fact that our meth- 
ods can be naturally modified to provide efficient approximate algorithms with theoretical analysis on 
convergence and error bounds. 

The remainder of the paper is organized as follows. In Section 2, we first introduce some basic 
concepts in graph theory and Gaussian graphical models. Then we briefly review BP, LBP, and walk- 
sum analysis. We also define the notion of an FVS and state some relevant results from the literature. 
In Section 3, we show that for a class of graphs with small FVS, inference problems can be solved 
efficiently and exactly by FMP We start with the single feedback node case, and illustrate the algorithm 
using a concrete example. Then we describe the general algorithm with multiple feedback nodes. We 
also prove that the algorithm converges and produces correct estimates of the means and variances. In 
Section 4, we introduce approximate FMP, where we use a pseudo-FVS of bounded size. We also present 
theoretical results on convergence and accuracy of approximate FMP. Then we provide an algorithm for 
selecting a good pseudo-FVS. In Section 5, we present numerical results. The experiments are performed 
on two-dimensional grids, which are widely used in various research areas including image processing. 
We design a series of experiments to analyze the convergence and accuracy of approximate FMP. We 
also compare the performance of the algorithm with different choices of pseudo-FVS, and demonstrate 
that excellent performance can be achieved with a pseudo-FVS of modest size chosen in the manner 
we describe. Finally in Section 6, we conclude with a discussion of our main contributions and future 
research directions. 

II. Background 

A. Gaussian Graphical Models 

The set of conditional independence relationships among a collection of random variables can be 
represented by a graphical model 1120 1. An undirected graph Q = {V,£) consists of a set of nodes 
(or vertices) V and a set of edges £. Each node s G V corresponds to a random variable Xg- We 
say that a set C C V separates sets A,B C V if every path connecting A and B passes through 
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(a) The sparsity pattern of the underlying graph 



(b) The sparsity pattern of the information matrix 



Fig. 1. The relationship between the sparsity pattern in the underlying graph and the sparsity pattern in the information matrix 
of a Gaussian graphical model. Conditional independences can be directly read from either the sparsity pattern of the graph 
structure or the sparsity pattern of the information matrix. 



C. The random vectoj^ xy is said to be Markov with respect to Q = (V, £) if for any subset A, B, 
C C V where C separates A and B, we have that x^i and are independent conditioned on xc, i.e., 
p{dca,^b\^c) = p(xyi|xc)p(xyi|xc). Such Markov models on undirected graphs are also commonly 
referred to as undirected graphical models or Markov random fields. 

In a Gaussian graphical model, the random vector xv is jointly Gaussian. The probability density 
function of a jointly Gaussian distribution is given by p(x) oc exp{ — ^x^Jx + h^x}, where J is the 
information, concentration or precision matrix and h is the potential vector. We refer to these parameters 
as the model parameters in information form. The mean vector and covariance matrix P are related 
to J and h by /I = J^^h and P = J^^. For Gaussian graphical models, the graph structure is sparse 
with respect to the information matrix J, i.e., Jj ,,■ / if and only if there is an edge between i and 
j. For example. Figure |l(a) is the underlying graph for the information matrix J with sparsity pattern 
shown in Figure |l(b)| For a non-degenerate Gaussian distribution, J is positive definite. The conditional 
independences of a collection of Gaussian random variables can be read immediately from the graph 
as well as from the sparsity pattern of the information matrix. If Jij = 0,i j, then Xi and Xj are 
independent conditioned on all other variables 11211 . Inference in Gaussian graphical models refers to the 
problem of estimating the means /Xj and variances Pa of every random variable Xi given J and h. 



^We use the notation xa, where A C V, to denote the collection of random variables {xs\s £ A}. 
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B. Belief Propagation and Loopy Belief Propagation 

BP is a message passing algorithm for solving inference problems in graphical models. Messages are 
updated at each node according to incoming messages from neighboring nodes and local parameters. It 
is known that for tree-structured graphical models, BP runs in linear time (in the cardinality n = |V| 
of the node set) and is exact. When there are cycles in the graph, LBP is used instead, where the same 
local message update rules as BP are used neglecting the existence of cycles. However, convergence and 
correctness are not guaranteed when there are cycles. 

In Gaussian graphical models, the set of messages can be represented by {AJj_j.j U A/ij_!.j}(j j)g£-. 
Consider a Gaussian graphical model: p(x) cx exp{— ^x^ Jx + h^x}. BP (or LBP) proceeds as follows 

m-. 

(1) Message Passing: 

The messages are initialized as AJ.^^^ and Ahf^j, for all (i, j) G £. These initializations may be 
chosen in different ways. In our experiments we initialize all messages with the value 0. 
At each iteration t, the messages are updated based on previous messages as 

A^a, = -JMuSf'r'i'^-", (2) 

where 

J^'' = Ju+ E a4-^), (3) 

keM{i)\j 

h!i-^=h.+ ^ ac;^ (4) 

keM{i)\j 

Here J\f{i) = {j G V : («, j) € £} denotes the set of neighbors of node i. The fixed-point messages 
are denoted as AJi^j and Ahi^j if the messages converge. 

(2) Computation of Means and Variances: 

The variances and means are computed based on the fixed-point messages as 

Ji = Jii + ^ ^Jk^i, (5) 

k&N(:i) 

hi = hi+ ^ Ahk^i. (6) 

k&M{i) 

The variances and means can then be obtained by Pa = Jf^ and fii = Jf^hi. 
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C. Walk-sum Analysis 

Computing means and variances for a Gaussian graphical model corresponds to solving a set of linear 
equations and obtaining the diagonal elements of the inverse of J respectively. There are many ways 
in which to do this - e.g., by direct solution, or using various iterative methods. As we outline in this 
section, one way to interpret the exact or approximate solution of this problem is through walk-sum 
analysis, which is based on a simple power series expansion of J~^. In |fT6l . ifTTl walk-sum analysis is 
used to interpret the computations of means and variances formally as collecting all required "walks" in 
a graph. The analysis in |[T6l identifies when LBP fails, in particular when the required walks cannot be 
summed in arbitrary orders, i.e., when the model is not walk-summablel^ One of the important benefits 
of walk-sum analysis is that it allows us to understand what various algorithms compute and relate them 
to the required exact computations. For example, as shown in 1 16|, LBP collects all of the required walks 
for the computation of the means (and, hence, if it converges always yields the correct means) but only 
some of the walks required for variance computations for loopy graphs (so, if it converges, its variance 
calculations are not correct). 

For simplicity, in the rest of the paper, we assume without loss of generality that the information matrix 
J has been normalized such that all its diagonal elements are equal to unity. Let R = I — J, and note 
that R has zero diagonal. The matrix R is called the edge-weight matrix^ 

A walk of length / > is defined as a sequence of vertices w = {wo,wi,W2, ■■■■,wi) where each step 
{wi,Wi+i) is an edge in the graph. The weight of a walk is defined as the product of the edge weights, 

liw) 

(t){w) = Y[Rwi^uWn (7) 

1=1 

where l{w) is the length of walk w. Also, we define the weight of a zero-length walk, i.e., a single node, 
as one. 

By the Neumann power series for matrix inversion, the covariance matrix can be expressed as 

oo 

P = J-^ = {I - R)-^ = ^rK (8) 

1=0 

This formal series converges (although not necessarily absolutely) if the spectral radius, p{R), i.e., the 
magnitude of the largest eigenvalue of R, is less than 1. 

'Walk-summability corresponds to the absolute convergence of the series corresponding to the walk-sums needed for variance 
computation in a graphical model fT6|- 

''The matrix R, which has the same off-diagonal sparsity pattern as J, is a matrix of partial correlation coefficients: Rij is 
the conditional correlation coefficient between Xi and Xj conditioned on all of the other variables in the graph. 
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Let W be a set of walks. We define the walk-sum of W as 

(/.(W) = J] (P{w). (9) 

We use — )• j) to denote the sum of all walks from node i to node j. In particular, we call — )• i) 
the self-return walk-sum of node i. 

It is easily checked that the (i, j) entry of equals — j), the sum of all walks of length / from 
node i to node j. Hence 

oo 

Pi^=4,{i^j) = Y^4>\i^j). (10) 

1=0 

A Gaussian graphical model is walk-summable (WS) if for all i,j G V, the walk-sum — )• j) 
converges for any order of the summands in (ITO^ (note that the summation in (ITOl) is ordered by walk- 
length). 

In walk-summable models, — )• j) is well-defined for all i,j G V. The covariances and the means 
can be expressed as 

Pij=Hi^j), (11) 
A^i = ^ hjPij = ^ j). (12) 

As shown in lfT6l for non-WS models, LBP may not converge and can, in fact, yield oscillatory variance 
estimates that take on negative values. 

Here we list some useful results from |[T6l that will be used in this paper. 

1) The following conditions are equivalent to walk-summability: 

(i) X]«)eW,^j l<^(^)l converges for all i,j G V, where Wi^j is the set of walks from i to j. 

(ii) p{R) < 1, where R is the matrix whose elements are the absolute values of the corresponding 
elements in R. 

2) A Gaussian graphical model is waUc-summable if it is attractive, i.e., every edge weight Rij is 
nonnegative. The model is also walk-summable if the graph is cycle-free. 

3) For a walk-summable Gaussian graphical model, LBP converges and gives the correct means. 

4) In walk-summable models, the estimated variance from LBP for a node is the sum over all 
backtracking walkjfl, which is a subset of all self-return walks needed for computing the correct 
variance. 

'a backtracking walk of a node is a self-return walk that can be reduced consecutively to a single node. Each reduction is to 
replace a subwalk of the form {i, j, i} by the single node {i}. For example, a self-return walk of the form 12321 is backtracking, 
but a walk of the form 1231 is not. 
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(a) A graph with an FVS of size one 




(b) A graph with an FVS of size two 



Fig. 2. Examples of FVS's of different sizes. After removing the nodes in an FVS and their incident edges, the reminder of 
the graph is cycle-free. 



D. Feedback Vertex Set 

A feedback vertex set (FVS), also called a loop cutset, is defined as a set of vertices whose removal 



(with the removal of incident edges) results in an cycle-free graph |22|. For example, in Figure 2(a) 



node 1 forms an FVS by itself since it breaks all cycles. In Figure |2(b)[ the set consisting of nodes 1 
and 2 is an FVS. The problem of finding the FVS of the minimum size is called the minimum feedback 
vertex set problem, which has been widely studied in graph theory and computer science. For a general 
graph, the decision version of the minimum FVS problem, i.e., deciding whether there exists an FVS of 
size at most k, has been proved to be NP-complete |[23l . Finding the minimum FVS for general graphs 
is still an active research area. To the best of the authors' knowledge, the fastest algorithm for finding 
the minimum FVS runs in time 0(1.7548"), where n is the number of nodes Il24l . 

Despite the difficulty of obtaining the minimal FVS, approximate algorithms have been proposed to 
give an FVS whose size is bounded by a factor times the minimum possible size |[25l - |[27l . In |[27l . 
the authors proposed an algorithm that gives an FVS of size at most two times the minimum size. The 
complexity of this algorithm is 0{m.ui{m\ogn,iT?}), where m and n are respectively the number of 
edges and vertices. In addition, if one is given prior knowledge of the graph structure, optimal or near 
optimal solutions can be found efficiently or even in linear time for many special graph structures ||28l - 
|[30l . Fixed-parameter polynomial- time algorithms are also developed to find the minimum FVS if the 
minimum size is known to be bounded by a parameter IIBTI . 
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III. Exact Feedback Message Passing 

In this section, we describe the exact FMP algorithm (or simply FMP) which gives the exact inference 
results for all nodes. We initialize FMP by selecting an FVS, T, using any one of the algorithms mentioned 
in Section III-DI The nodes in the FVS are called feedback nodes. 

We use a special message update scheme for the feedback nodes while using standard BP messages 
(although, as we will see, not in a standard way) for the non-feedback nodes. In FMP, two rounds of BP 
message passing are performed with different parameters. In the first round of BP, we obtain inaccurate 
"partial variances" and "partial means" for the nodes in the cycle-free graph as well as some "feedback 
gains" for the non-feedback nodes. Next we compute the exact inference results for the feedback nodes. 
In the second round of standard BP, we make corrections to the "partial variances" and "partial means" 
of the non-feedback nodes. Exact inference results are then obtained for all nodes. 

Before describing FMP, we introduce some notation. With a particular choice, T, of FVS and with 
T = as the remaining cycle-free graph, we can define submatrices and subvectors respectively of J 
and h. In particular, let Jj denote the information matrix restricted to nodes of - i.e., for convenience 
we assume we have ordered the nodes in the graph so that T consists of the first k nodes in V, so that 
J J corresponds to the upper-left A; x /c block of J, and similarly J-j, the information matrix restricted 
to nodes in T corresponds to the lower right (n — A;) x (n — k) block of J. We can also define J-jj, 
the lower left cross-information matrix, and its transpose (the upper-right cross-information matrix) Jjt- 
Analogously we can define the subvectors hjr and 117-. In addition, for the graph Q and any node j, let 
AA(j) denote the neighbors of j, i.e., the nodes connected to j by edges. 

In this section we first describe FMP for the example in Figure |3(a)[ in which the FVS consists of a 
single node. Then we describe the general FMP algorithm with multiple feedback nodes. We also prove 
the correctness and analyze the complexity. 



A. The Single Feedback Node Case 



Consider the loopy graph in Figure |3(a)| and a Gaussian graphical model, with information matrix J 
and potential vector h, defined on it. Let J and h be the information matrix and potential vector of the 
model respectively. In this graph every cycle passes through node 1, and thus node 1 forms an FVS by 
itself. We use T to denote the subgraph excluding node 1 and its incident edges. Graph T is a tree, 
which does not have any cyclesjj Using node 1 as the feedback node, FMP consists of the following 



More generally, the cycle-free graph used in FMP can be a collection of disconnected trees, i.e., a forest. 
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steps: 

Step 1: Initialization 

We construct an additional potential vector = J7- 1 on T, i.e. is the submatrix (column vector) 
of J with column index 1 and row indices corresponding to T. Note that, since in this case F = {1}, 
this new potential vector is precisely Jtt- In the general case Jfjr will consist of a set of columns, 
one for each element of the FVS, where each of those columns is indexed by the nodes in T. Note that 
h\ = Jii, for all i G M{1) and h\ = 0, for all i ^ M{1) U {!}. We can view this step as node 1 sending 
messages to its neighbors to obtain h^. See Figure [3(b)] for an illustration. 

Step 2: First Round of BP on Jj- (Figure 3(c)| i 



We now perform BP on T twice, both times using the information matrix J7-, but two different potential 
vectors. The first of these is simply the original potential vector restricted to T, i.e., liq-. The second 
uses as constructed in Step \\j The result of the former of these BP sweeps yields for each node 
i in T its "partial vaiiance" = {J^^)ii and its "partial mean" fij = {J^^h'j-)i by standard BP 
message passing on T. Note that these results are not the true variances and means since this step does 
not involve the contributions of node 1. At the same time, BP using yields a "feedback gain" gj, 
where gj = (J^^h^)j by standard BP on Since T is a tree-structured graph, BP terminates in linear 
time. 

Step 3: Exact Inference for the Feedback Node 

Feedback node 1 collects the "feedback gains" from its neighbors as shown in Figure |3(d)| Node 1 
then calculates its exact variance and mean as follows: 

Pn = (Jii- Yl J^i9]r\ (13) 
ieA^{i) 

W = Pii{hi- "^li^D- (14) 

iGAr(i) 

In this step, all the computations involve only the parameters local to node i, the "feedback gains" 
from, and the "partial means" of node I's neighbors. 



Step 4: Feedback Message Passing (Figure 3(e) 1 



^Note that since both BP passes here - and, in the general case, the set of fc + 1 BP passes in this step - use the same 
information matrix. Hence there are economies in the actual BP message-passing as the variance computations are the same for 
all. 

^The superscript 1 of g\ means this feedback gain corresponds to the feedback node 1, notation we need in the general case 
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After feedback node 1 obtains its own variance and mean, it passes the results to all other nodes in 
order to correct their "partial variances" and "partial means" fif computed in Step 2. 
The neighbors of node 1 revise their node potentials as follows: 

r I hj-Jijfii, VjGAA(l), 

hj = < (15) 
( hj, Vj ^AA(l). 

From ([TST i we see that only node I's neighbors revise their node potentials. The revised potential vector 

hx and Jj- are then used in the second round of BP. 

Step 5: Second Round of BP on Jj- (Figure |3(f)| ) 

We perform BP on T with Jj- and 117- ). The means /Xj = {J^^hf)i, obtained from this round of BP 
are the exact means. 

The exact variances can be computed by adding correction terms to the "partial variances" as 

Pii = pT + Pu{glf, yier, (16) 

where the "partial variance" P^ and the "feedback gain" gj are computed in Step 2. There is only one 
correction term in this single feedback node case. We will see that when the size of FVS is larger than 
one, there will be multiple correction terms. 

B. Feedback Message Passing for General Graphs 

For a general graph, the removal of a single node may not break all cycles. Hence, the FVS may 
consist of multiple nodes. In this case, the FMP algorithm for a single feedback node can be generalized 
by adding extra feedback messages, where each extra message corresponds to one extra feedback node 
in the FVS. 

Assume an FVS, T, has been selected, and, as indicated previously, we order the nodes such that 
= {1, . . . ,k}. The FMP algorithm with multiple feedback nodes is essentially the same as the FMP 
algorithm with a single feedback node. When there are k feedback nodes, we compute k sets of feedback 
gains each corresponding to one feedback node. More precisely. Step 1 in the algorithm now involves 
performing BP on T A; + 1 times, all with the same information matrix, J7-, but with different potential 
vectors, namely h-]- and h^, p = 1, ,k, where these are the successive columns of Jtt- To obtain the 
exact inference results for the feedback nodes, we then need to solve an inference problem on a smaller 
graph, namely T, of size k, so that Step 3 in the algorithm becomes one of solving a A:-dimensional 
linear system. Step 4 then is simply modified from the single-node case to provide a revised potential 
vector on T taking into account corrections from each of the nodes in the FVS. Step 5 then involves a 
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(c) First round of BP 
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(f) Second round of BP 



Fig. 3. The FMP algorithm with a single feedback node 



single sweep of BP on T using this revised potential vector to compute the exact means on T, and the 
feedback gains, together with the variance computation on the FVS, provide corrections to the partial 
variances for each node in T. The general FMP algorithm with a given FVS J-" is summarized in Figure 
II 

C. Correctness and Complexity of FMP 

In this subsection, we analyze the correctness and computational complexity of FMP 

Theorem 1. The feedback message passing algorithm described in Figure H] results in the exact means 
and exact variances for all nodes. 



Proof: To make the notation in what follows somewhat less cluttered, let Jm = Jtt so that we can 



wnte 



J = 




T' 


and h = 






Jm 


Jt 




hr 



(17) 
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Input: information matrix J, potential vector h and feedback vertex set T of size k 
Output: mean /ij and variance Pa for every node i 

1. Construct k extra potential vectors: Vp G J^, = Jt,v^ each corresponding to one feedback node. 

2. Perform BP on T with Jj, hj- to obtain Pj^ = {J^^)ii and = {J^^h.q-)i for each i £ T. With 
the A; extra potential vectors, calculate the feedback gains gj = {Jj^^h^)i, gf = ( J^^h^)^, . . . ,gf = 
( J^^h'^), for i G r by BP 

3. Obtain a size-Zc subgraph with Jjr and hjr given by 

{JT)pq = Jpq- ^ Jpjg], Vp, g G J", 

jeAr(p)nr 

(hj-)p = hp- ^ -'pi/^f , G T, 

j<^M(p)nT 

and solve the inference problem on the small graph by Pjr = J^^ and fijr = J^^hjr. 

4. Revise the potential vector on T by 

hi = hi- ^ JijifJ-J^)j, yi&T. 

5. Another round of BP with the revised potential vector 117- gives the exact means for nodes on T. 
Add correction terms to obtain the exact variances for nodes in T: 



peTqeT 



Fig. 4. The FMP algorithm with a given FVS 



Similarly, we can write 





P' 


and jj. = 








Pm 


Pt _ 


Mr 



(18) 



By the construction of h^, h^, . . . , h'^ in FMP and ( fTTl ). 

Jm = [h\h2,...,h'^]. (19) 

The feedback gains g^, g^, . . . , g'^ in FMP are computed by BP with h^, h^, . . . , h'^ as potential vectors. 
Since BP gives the exact means on trees, 
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In FMP, /I ' is computed by BP with potential vector h-p, so 

^'^ = j-ihr. (21) 

The diagonal of J^^ is also calculated exactly in the first round of BP in FMP as = {J^^)ii. 
Since P = J^^, by matrix computations, we have 

Pt = Jr^ + {JT^JM)Pr{JrJM)'. (22) 
Substituting dlOll into dlUl, we have 

P^^ = PI+Y.Y.9'^iPr),,9^, Vi € T, (23) 

where P^ is the "partial variance" of node i and the "feedback gain" in FMP. Here Pjr is the exact 
covariance matrix of the feedback nodes in F. This is the same equation as in Step 5 of FMP. We need 
to show that Pjr is indeed calculated exactly in FMP. 
By Schur's complement, 

JT = Pr^ = JT-J'MJT^JM and V = = V - JM'/r'hr- (24) 

By (EOl) and 

J^ = J^- j;,[g\g2,...,g^] and = - (25) 

which is exactly the same formula as in Step 3 of FMP. Therefore, we obtain the exact covariance matrix 
and exact means for nodes in F by solving Pjr = ( Jjr)^^ and fijr = Pjrhjr. 
Since n = J^h, from ([T7J and (fTSl l we can get 

Hr = Jr^O^T - JutJ-j^) ■ (26) 

We define hf = hf — JMfJ'T^ 

{hr)i = hi- ^ Jij{fj,jr)j, (27) 

where /ij- is the exact mean of nodes in F. This step is equivalent to performing BP with parameters 
Jj- and the revised potential vector 117- as in Step 4 of FMP. This completes the proof. ■ 
We now analyze the computational complexity of FMP with k denoting the size of the FVS and n the 
total number of nodes in the graph. In Step 1 and Step 2, BP is performed on T with k + 2 messages (one 
for J, one with hf, and one for each h*'). The total complexity is 0{k{n — k)). In step 3, 0{k'^{n — k)) 
computations are needed to obtain Jjr and hjr and 0{k'^) operations to solve the inference problem on 
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a graph of size k. In Step 4 and Step 5, it takes 0{k{n — k)) computations to give the exact means and 
0{k'^{n — k)) computations to add correction terms. Therefore, the total complexity is 0{k'^n). Therefore, 
the computational complexity of FMP is 0{k'^n). This is a significant reduction from 0{tr') of direct 
matrix inversion when k is small. 

IV. Approximate Feedback Message Passing 

As we have seen from Theorem [T] FMP always gives correct inference results. However, FMP is 
intractable if the size of the FVS is very large. This motivates our development of approximate FMP, 
which uses a pseudo-FVS instead of an FVS. 

A. Approximate FMP with a Pseudo-FVS 

There are at least two steps in FMP which are computationally intensive when k, the size of the FVS, is 
large: solving a size-A; inference problem in Step 3 and adding k"^ correction terms to each non-feedback 
node in Step 5. One natural approximation is to use a set of feedback nodes of smaller size. We define a 
pseudo-FVS as a subset of an FVS that does not break all the cycles. A useful pseudo-FVS has a small 
size, but breaks the most "crucial" cycles in terms of the resulting inference errors. We will discuss how 
to select a good pseudo-FVS in Section IIV-DI In this subsection, we assume that a pseudo-FVS is given. 

Consider a Gaussian graphical model Markov on a graph Q = (V,<S). We use F to denote the given 
pseudo-FVS, and use T to denote the pseudo-tree (i.e., a graph with cycles) obtained by eliminating 
nodes in F from Q. With a slight abuse of terminology, we still refer to the nodes in F as the feedback 
nodes. A natural extension is to replace BP by LBP in Step 2 and Step 5 of FMP@ 

The total complexity of approximate FMP depends on the size of the graph, the cardinality of the 
pseudo-FVS, and the number of iterations of LBP within the pseudo-tree. Let k be the size of the 
pseudo-FVS, n be the number of nodes, m be the number of edges in the graph, and D be the maximum 
number of iterations in Step 2 and Step 5. By a similar analysis as for FMP, the total computational 
complexity for approximate FMP is 0{k'^n-\-kmD). Assuming that we are dealing with relatively sparse 
graphs, so that m = 0{n), reductions in complexity as compared to a use of a full FVS rely on both k 
and D being of moderate size. Of course the choices of those quantities must also take into account the 
tradeoff with the accuracy of the computations. 

'of course, one can insert other algorithms for Steps 2 and 5 ~ e.g., iterative algorithms such as embedded trees 11171 which 
can yield exact answers. However, here we focus on the use of LBP for simplicity. 
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B. Convergence and Accuracy 

In this subsection, we provide theoretical results on convergence and accuracy of approximate FMP. We 
first provide a result assuming convergence that makes several crucial points, namely on the exactness of 
means throughout the entire graph, the exactness of variances on the pseudo-FVS, and on the interpretation 
of the variances on the remainder of the graph as augmenting the LBP computation with a rich set of 
additional walks, roughly speaking those that go through the pseudo-FVS: 

Theorem 2. Consider a Gaussian graphical model with parameters J and h. If approximate FMP 
converges with a pseudo-FVS T, it gives the correct means for all nodes and the correct variances on 
the pseudo-FVS. The variance of node i in T calculated by this algorithm equals the sum of all the 
backtracking walks of node i within T plus all the self-return walks of node i that visit T, so that the 
only walks missed in the computation of the variance at node i are the non-backtracking walks within 
f. 

Proof: We have 



J = 




J' 


and h = 






Jm 


Jf _ 







By Result[3l) in Section ITl-Ci when LBP converges, it gives the correct means. Hence, after convergence, 
for 2 = 1, 2, . . . , A;, we have 

g* = J~^Jf.i, and ^J = J^^hf, 

where g* is the feedback gain corresponding to feedback node i and /i^ is the partial mean in approximate 
FMP These quantities are exact after convergence. 

Since g* and fjJ^ are computed exactly, following the same steps as in the proof of Theorem [T] we 
can obtain the exact means and variances for nodes in T. 

From the proof of Theorem [T] we also have 

fi^ = J^\hf- JMfij,). (29) 

We have shown that is computed exactly in Step 3 in approximate FMP, so — JMt^^p is 
computed exactly. Since LBP on T gives the exact means for any potential vector, the means of all 
nodes in T are exact. 

As in the proof of Theorem [T] we have that the exact covariance matrix on T is given by 

Pf = + {j^^jM)PAJ^^jMy. (30) 
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As noted previously, the exact variance of node i ^ T equals the sum of all the self-return walks of 
node i. We partition these walks into two classes: self-return walks of node i within T, and self -returns 
walks that visit at least one node in F. The diagonal of Jr^ captures exactly the first class of walks. 



Hence, the second term in the right-hand side of (BOl ) corresponds to the sum of the second class of 
walks. Let us compare each of these terms to what is computed by the approximate FVS algorithm. By 
Result H) in Section HTCl LBP on T gives the sums of all the backtracking walks after convergence. So 
the first term in (|30] | is approximated by backtracking walks. However, note that the terms J^^Jm and 
are obtained exactly]^ Hence, the approximate FMP algorithm computes the second term exactly 
and thus provides precisely the second set of walks. As a result, the only walks missing from the exact 
computation of variances in T are non-backtracking walks within T. This completes the proof. ■ 
We now state several conditions under which we can guarantee convergence. 

Proposition 1. Consider a Gaussian graphical model with graph Q = (V, £) and model parameters J 
and h. If the model is walk-summable, approximate FMP converges for any pseudo-FVS J- C V. 

Proof: Let R = I — J and {R)ij = \Rij\- In approximate FMP, LBP is performed on the pseudo-tree 
induced by T = V\T. The information matrix on the pseudo-tree is J^, which is a submatrix of J. By 
Corollary 8.1.20 in ||32 , for any f 

p{Rf) < p{R) < 1. (31) 

By Result [3]) in Section ITl-C[ LBP on T is guaranteed to converge. All other computations in approxi- 
mate FMP terminate in a finite number of steps. Hence, approximate FMP converges for any pseudo-FVS 
-FC V. ■ 

For the remainder of the paper we will refer to the quantities as in (|3TI ) as the spectral radii of the 
corresponding graphs (in this case T and the original graph Q). Walk-summability on the entire graphical 
model is actually far stronger than is needed for approximate FMP to converge. As the proof of Proposition 
[U suggests, all we really need is for the graphical model on the graph excluding the pseudo-FVS to be 
walk-summable. As we will discuss in Section ITV-Di this objective provides one of the drivers for a very 
simple algorithm for choosing a pseudo-FVS in order to enhance the walk-summability of the remaining 
graph and as well as accuracy of the resulting LBP variance computations. 

'"Note that the columns of the former are just the feedback gains computed by LBP for each of the additional potential vectors 
on T corresponding to columns of Jf^, which we have already seen are computed exactly, as we have for the covariance on 
the pseudo-FVS. 
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Remarks: The following two results follow directly from Proposition [T] 

1) Consider a waUc-summable Gaussian graphical model. Let Tj be a pseudo-FVS consisting of j 
nodes and / J^i C ^ • • • ^ -^fc ^ -7^, where T is an FVS, then M^^bp ^ y^T^ ^ y^T^ ^ 
. . . C W('' C Wl for any node i in the graph. Here WY^^ is the set of walks captured by LBP 
for calculating the variance of node i; W'[' is the set of walks captured by approximate FMP with 
pseudo-FVS Ty, and W^^ is the set of walks captured by FMP with FVS T . 

2) Consider an attractive Gaussian graphical model (i.e., one in which all elements of R are non- 
negative). Let J^i C ^ • • • ^ -^fc ^ denote the pseudo-FVS (FVS), and i^^BP^ pf^^ ^ 
Pj^*", P^^ denote the corresponding variances calculated for node % by LBP, approximate FMP and 
FMP respectively. Pa represents the exact variance of node i. We have i^^BP < pT^ < pT:, ^ 
• • • < Pfi'' <Pii = Pa for any node i in V. 

The above results show that with approximate FMP, we can effectively trade off complexity and 
accuracy by selecting pseudo-FVS of different sizes. 

C. Error Bounds for Variance Computation 

We define the measure of the error of an inference algorithm for Gaussian graphical models as the 
average absolute error of variances for all nodes: 



where n is the number of nodes. Pa is the computed variance of node i by the algorithm and Pa is the 
exact variance of node i. 

Proposition 2. Consider a walk-summable Gaussian graphical model with n nodes. Assume the infor- 
mation matrix J is normalized to have unit diagonal. Let epMP denote the error of approximate FMP 
and Pf^^ denote the estimated variance of node i. Then 



where k is the number of feedback nodes, p is the spectral radius corresponding to the subgraph T, and 
g denotes the girth ofT, i.e., the length of the shortest cycle in T. In particular, when k = 0, i.e., LBP 
is used on the entire graph, we have 




(32) 
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where the notation is similarly defined. 

Some of the following proof techniques are motivated by the proof of the error bound on determinant 
estimation with the so-called orbit-product representation in |[33i . 
Proof: By Theorem |2j 

n ^-^ 

iev 

where </»(i — > i) denotes the sum of all non-backtracking self-return walks of node i. 



We have 



'-Em^^)\<'-E^i^^^h (34) 



CLBP 

n — ' ■ ' ' ■ n 



where (^(•) denotes the sum of absolute weight of walks, or walk-sums defined on R. 

Non-backtracking self-return walks must contain at least one cycle. So the minimum length of a 
non-backtracking walk is g, which is the minimum length of cycles. Thus 

1 1 °° 

^LBP < ^ E ^ ^) ^ ^ E E (^'")- (35) 

ieV ieV rn=g 

^ oo ^ oo 

= -Tr( = - V Tr(i?-). (36) 

n ^-^ n ^-^ 

in=g m=g 

Let Aj(-) denotes the ith largest eigenvalue of a matrix. Since \i{R^) = \i{R)'^ and \i{R) < p, we 
have 

n 

TY(ii™) = XiiRr < nfi^. (37) 
1=1 

Therefore, 

eLBP < - V = (38) 
n ^-^ 1 — jO 

When approximate FMP is used with a size-Zc pseudo-FVS, the variances of nodes in the pseudo-FVS 
are computed exactly, while the variance errors for other nodes are the same as performing LBP on the 
subgraph excluding the pseudo-FVS. Therefore, 

ePMF = - E 1^- - ^-1 = - E 1^- - ^-1 (39) 

= - n - A; eLBP < (40) 

n n 1 — p 

■ 

An immediate conclusion of Proposition |2] is that if a graph is cycle-free (i.e., g = oo), the error eLBP 
is zero. 
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We can also analyze the perfomiance of FMP on a Gaussian graphical model that is Markov on a 
Erdos-Renyi random graph 6(n, c/n). Each edge in such a random graph with n nodes appears with 
probability c/n, independent of every other edge in the graph [34 j . 

Proposition 3. Consider a sequence of graphs {Gn}'^=i drawn from Erdos-Renyi model C5(n, c/n) with 
fixed c. Suppose we have a sequence of Gaussian graphical models parameterized by {( J^, h„)}J^^ that 
are Markov on {<5n}^=i and are strictly walk-summable (i.e., the spectral radii p{Rn) are uniformly 
upper hounded away from unity). Then asymptotically almost surely there exists a sequence of pseudo- 
FVS {J-'n}^=i with J^n of size 0(logn), with which the error of approximate FMP as in (1321 ) approaches 
zero. 

Proof: We can obtain a graph with girth greater than I by removing one node at every cycle of 
length up to /. The number of cycles of length up to / in ©(n, c/n) is ©(c') asymptotically almost surely 
(Corollary 4.9 in [34 1). So we can obtain a graph of girth log log n by removing C'(logn) nodes. By 
Proposition [2j the error approaches zero when n approaches infinity. ■ 

D. Finding a good Pseudo-FVS of Bounded Size 

One goal of choosing a good pseudo-FVS is to ensure that LB? converges on the remaining subgraph; 
the other goal is to obtain smaller inference errors. In this subsection we discuss a local selection criterion 
motivated by these two goals and show that the two goals are consistent. 

Let R denote the absolute edge weight matrix. Since p{R) < 1 is a sufficient condition for LBP to 
converge on graph Q, obtaining convergence reduces to that of removing the minimum number of nodes 
such that p{Rf) < 1 for the remaining graph T. However, searching and checking this condition over 
all possible sets of pseudo-FVS 's up to a desired cardinality is a prohibitively expensive, and instead we 
seek a local method (i.e., using only quantities associated with individual nodes) for choosing nodes for 
our pseudo-FVS, one at a time, to enhance convergence. The principal motivation for our approach is 
the following bound |[3^ on the spectral radius of a nonnegative matrix: 



We further simplify this problem by a greedy heuristic: one feedback node is chosen at each iteration. 
This provides a basis for a simple greedy method for choosing nodes for our pseudo-FVS. In particular, 
at each stage, we examine the graph excluding the nodes already included in the pseudo-FVS and select 




(41) 



the node with the largest sum of edge weights, i.e., argmax Rij. We then remove the node from 
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Input: information matrix J and the maximum size k of the pseudo-FVS 
Output: a pseudo-FVS F 

1 . Let F = $ and normalize J to have unit diagonal. 

2. Repeat until \F\ = k ox the remaining graph is empty. 

(a) Clean up the current graph by eliminating all the tree branches. 

(b) Update the scores s{i) = 

(c) Put the node with the largest score into F and remove it from the current graph. 
Fig. 5. The pseudo-FVS selection criterion 

the graph and put it into F. We continue the same procedure on the remaining graph until the maximum 
allowed size A; of is reached or the remaining graph does not have any cycles. 

The selection algorithm is summarized in Figure |5] Note that while the motivation just given for this 
method is to enhance convergence of LBP on T, we are also enhancing the accuracy of the resulting 
algorithm, as Proposition |2] suggests, since the spectral radius p{R) is reduced with the removal of nodes. 
In addition, as shown in Theorem |2j the only approximation our algorithm makes is in the computation of 
variances for nodes in T, and those errors correspond to non-backtracking self -ret urn walks confined to 
T (i.e., we do capture non-backtracking self -return walks that exit T and visit nodes in the pseudo-FVS). 
Thus, as we proceed with our selection of nodes for our pseudo-FVS, it makes sense to nodes with the 
largest edge-weights to nodes that are left in T, which is precisely what this approach accomplishes. 

The complexity of the selection algorithms is 0{km), where m is the number of edges and k is the 
size of the pseudo FVS. As a result, constructing a pseudo-FVS in this manner is computationally simple 
and negligible compared to the inference algorithm that then exploits it. 

Finding a suitable pseudo-FVS is important. We will see in Section |V] that there is a huge performance 
difference between a good selection and a bad selection of F. In addition, experimental results show that 
with a good choice of pseudo-FVS (using the algorithms just described), we not only can get excellent 
convergence and accuracy results but can do this with pseudo-FVS of cardinality k and number of 
iterations D that scale well with the graph size n. Empirically, we find that we only need O(logn) 
feedback nodes as well as very few iterations to obtain excellent performance, and thus the complexity 
is 0{n\o^{n)). 
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V. Numerical Results 

In this section, we apply approximate FMP to graphical models that are Markov on two-dimensional 
grids and present results detailing the convergence and correctness of our proposed algorithm. Two- 
dimensional grids are sparse since each node is connected to a maximum of four neighbors. There have 
been many studies of inference problems on grids ll35l . However, inference cannot, in general, be solved 
exactly in linear time due to the existence of many cycles of various lengths. It is known that the size of 
the FVS for a grid grows linearly with the number of nodes on the grid |[36l . Hence, we use approximate 
FMP with a pseudo-FVS of bounded size to ensure that inference is tractable. 

In our simulations, we consider I x I grids with different values of I. The size of the graph is thus 
n = P. We randomly generate an information matrix J that has the sparsity pattern corresponding to a 
grid. Its nonzero off-diagonal entries are drawn from an i.i.d. uniform distribution with support in [—1,1]. 
We ensure J is positive definite by adding XI for sufficiently large A. We also generate a potential vector 
h, whose entries are drawn i.i.d. from a uniform distribution with support in [—1,1]. Without loss of 
generality, we then normalize the information matrix to have unit diagonal. 

A. Convergence of Approximate FMP 

In Figure |6j we illustrate our pseudo-FVS selection procedure to remove one node at a time for a 
graphical model constructed as just-described on a 10 x 10 grid. The remaining graphs, after removing 
0, 1, 2, 3, 4, and 5 nodes, and their corresponding spectral radii p{R) are shown in the figures. LBP 
does not converge on the entire graph and the corresponding spectral radius is p{R) = 1.0477. When one 
feedback node is chosen, the spectral radius corresponding to the remaining graph is reduced to 1.0415. 
After removing one more node from the graph, the spectral radius is further reduced to 0.97249, which 
ensures convergence. In all experiments on 10 x 10 grids, we observe that by choosing only a few nodes 
(at most three empirically) for our pseudo-FVS, we can obtain convergence even if LBP on the original 
graph diverges. 

In Figure |7] we show that the spectral radius and its upper bound given in (1411) decrease when more 
nodes are included in the pseudo-FVS. Convergence of approximate FMP is immediately guaranteed 
when the spectral radius is less than one. 

B. Accuracy of Approximate FMP 

In this subsection, we show numerical results of the inference errors defined in (|32] i. On each grid, 
LBP and the approximate FMP algorithms with two different sets of feedback nodes are performed. One 
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3 node removed: p=0.95638 4 nodes removed: p=0.95631 5 nodes removed: p=0.86673 



Fig. 6. Size of tlie pseudo-FVS and tiie spectral radius of the corresponding remaining graph 



set has k = [log n] feedback nodes while the other has k = y/n feedback nodes. The horizontal axis 
shows the number of message passing iterations. The vertical axis shows the errors for both variances 
and means on a logarithmic scale 1^ 

In FiguresJS] to [TH numerical results are shown for 10 x 10, 20 x 20, 40 x 40 and 80 x 80 grids 
respectivelyo Except for the model in Figure [8] LBP fails to converge for all models. With k = [log n] 
feedback nodes, approximate FMP converges for all the grids and gives much better accuracy than 
LBP. In Figure [8] where LBP converges on the original graph, we obtain more accurate variances and 
improved convergence rates using approximate FMP. In Figure |9] to [121 LBP diverges while approximate 
FMP gives inference results with small errors. When k = ^/n feedback nodes are used, we obtain even 
better approximations but with more computations in each iteration. We performed approximate FMP 
on different graphs with different parameters, and empirically observed that k = [logn] feedback nodes 
seem to be sufficient to give a convergent algorithm and good approximations. 

Remarks: The question, of course, arises as to whether it is simply the size of the pseudo-FVS that 

"The error of means is defined in the manner as variances - the average of the absolute errors of means for all nodes. 
'"Here we use shorthand terminology, where fc-FVS refers to running our approximate FMP algorithm with a pseudo-FVS of 
cardinality k. 
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Number of selected feedback nodes 



(a) 10 X 10 grid 



2 4 6 8 10 12 14 16 18 

Number of selected feedback nodes 



(b) 20 X 20 grid 




4 6 8 10 12 14 16 18 

Number of selected feedback nodes 




Number of selected feedback nodes 



(c) 40 X 40 grid (d) 80 x 80 grid 

Fig. 7. Number of selected feedback nodes v.s. the spectral radius and its bound 

is important. However, numerical results show that approximate FMP does not give satisfactory results 
if we choose a "bad" pseudo-FVS. In Figure [T3j we present results to demonstrate that the approximate 
FMP algorithm with a badly selected pseudo-FVS indeed performs poorly. The pseudo-FVS is selected 
by the opposite criterion of the algorithm in Figure |5] i.e., the node with the smallest score is selected 
at each iteration. We can see that LBP, 7-FVS, and 40-FVS algorithms all fail to converge. These results 
suggest that when a suitable set of feedback nodes are selected, we can leverage the graph structure and 
model parameters to dramatically improve the quality of inference in Gaussian graphical models. 



VI. Conclusions and Future Directions 

In this paper we have developed the feedback message passing algorithm where we first identify a set of 
feedback nodes. The algorithm structure involves first employing BP algorithms on the remaining graph 
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Error of variances for 10 x 10 grid 




Error of means for 10 x 10 grid 



-LBP 
- 5-FVS 
-10-FVS 



5 6 

Iterations 



(a) Evolution of variance errors with iterations 
Fig. 8. Inference errors of a 10 x 10 grid 

Error of variances for 10 x 10 grid 
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(a) Evolution of variance errors with iterations 
Fig. 9. Inference errors of a 10 x 10 grid 



OX) 

o 

J -10 






5 









-5 




-10 


u 




O 






-15 


<u 




of 


-20 


OX) 

o 


-25 








-30 




-35 




-40 



(b) Evolution of mean errors with iterations 



Error of means for 10 x 10 grid 
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Iterations 



(b) Evolution of mean errors with iterations 



(excluding the FVS), although with several different sets of node potentials at nodes that are neighbors 
of the FVS; then using the results of these computations to perform exact inference on the FVS; and then 
employing BP on the remaining graph again in order to correct the answers on those nodes to yield exact 
answers. The feedback message passing algorithm solves the inference problem exactly in a Gaussian 
graphical model in linear time if the graph has a FVS of bounded size. Hence, for a graph with a large 
FVS, we propose an approximate feedback message passing algorithm that chooses a smaller "pseudo- 
FVS" and replaces BP on the remaining graph with its loopy counterpart LBP. We provide theoretical 
results that show that, assuming convergence of the LBP, we still obtain exact inference results (means 
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(a) Evolution of variance errors with iterations (b) Evolution of mean errors with iterations 

Fig. 10. Inference errors of a 20 x 20 grid 



Error of means for 40 x 40 grid 
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(a) Evolution of variance errors with iterations 
Fig. 11. Inference errors of a 40 x 40 grid 



Error of means for 80 x 80 grid 
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(a) Evolution of variance errors with iterations 

Fig. 12. Inference errors of an 80 x 80 grid 
January 20, 2013 
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(b) Evolution of mean errors with iterations 



Error of means for 80 x 80 grid 
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(a) Evolution of variance errors with iterations (b) Evolution of means errors with iterations 

Fig. 13. Inference errors with a bad selection of feedback nodes 



and variances) on the pseudo-FVS, exact means on the entire graph, and approximate variances on the 
remaining nodes that have precise interpretations in terms of the additional "walks" that are collected as 
compared to LBP on the entire graph. We also provide bounds on accuracy, and these, together with an 
examination of the walk-summability condition, provide an algorithm for choosing nodes to include in the 
pseudo-FVS. Our experimental results demonstrate that these algorithms lead to excellent performance 
(including for models in which LBP diverges) with pseudo-FVS size that grows only logarithmically 
with graph size. 

There are many future research directions based on the ideas of this paper. For examples, more extensive 
study of the performance of approximate FMP on random graphs is of great interest. In addition, as we 
have pointed out, LBP is only one possibility for the inference algorithm used on the remaining graph 
after a pseudo-FVS is chosen. One intriguing possibility is to indeed use approximate FMP itself on this 
remaining graph - i.e., nesting applications of this algorithm. This is currently under investigation, as are 
the use of these algorithmic constructs for other important problems, including the learning of graphical 
models with small FVS's and using an FVS or pseudo-FVS for efficient sampling of Gaussian graphical 
models. 
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